55 research outputs found

    Data mining in bioinformatics using Weka

    Get PDF
    The Weka machine learning workbench provides a general purpose environment for automatic classification, regression, clustering and feature selection-common data mining problems in bioinformatics research. It contains an extensive collection of machine learning algorithms and data exploration and the experimental comparison of different machine learning techniques on the same problem. Weka can process data given in the form of a single relational table. Its main objectives are to (a) assist users in extracting useful information from data and (b) enable them to easily identify a suitable algorithm for generating an accurate predictive model from it

    Jumble Java Byte Code to Measure the Effectiveness of Unit Tests

    Get PDF
    Jumble is a byte code level mutation testing tool for Java which inter-operates with JUnit. It has been designed to operate in an industrial setting with large projects. Heuristics have been included to speed the checking of mutations, for example, noting which test fails for each mutation and running this first in subsequent mutation checks. Significant effort has been put into ensuring that it can test code which uses custom class loading and reflection. This requires careful attention to class path handling and coexistence with foreign class-loaders. Jumble is currently used on a continuous basis within an agile programming environment with approximately 370,000 lines of Java code under source control. This checks out project code every fifteen minutes and runs an incremental set of unit tests and mutation tests for modified classes. Jumble is being made available as open source

    Water availability and agricultural demand:An assessment framework using global datasets in a data scarce catchment, Rokel-Seli River, Sierra Leone

    Get PDF
    Study region: The proposed assessment framework is aimed at application in Sub-Saharan Africa, but could also be applied in other hydrologically data scarce regions. The test study site was the Rokel-Seli River catchment, Sierra Leone, West Africa. Study focus: We propose a simple, transferable water assessment framework that allows the use of global climate datasets in the assessment of water availability and crop demand in data scarce catchments. In this study, we apply the assessment framework to the catchment of the Rokel-Seli River in Sierra Leone to investigate the capabilities of global datasets complemented with limited historical data in estimating water resources of a river basin facing rising demands from large scale agricultural water withdrawals. We demonstrate how short term river flow records can be extended using a lumped hydrological model, and then use a crop water demand model to generate irrigation water demands for a large irrigated biofuels scheme abstracting from the river. The results of using several different global datasets to drive the assessment framework are compared and the performance evaluated against observed rain and flow gauge records. New hydrological insights: We find that the hydrological model capably simulates both low and high flows satisfactorily, and that all the input datasets consistently produce similar results for water withdrawal scenarios. The proposed framework is successfully applied to assess the variability of flows available for abstraction against agricultural demand. The assessment framework conclusions are robust despite the different input datasets and calibration scenarios tested, and can be extended to include other global input datasets

    Perspectives on open access high resolution digital elevation models to produce global flood hazard layers

    Get PDF
    Global flood hazard models have recently become a reality thanks to the release of open access global digital elevation models, the development of simplified and highly efficient flow algorithms, and the steady increase in computational power. In this commentary we argue that although the availability of open access global terrain data has been critical in enabling the development of such models, the relatively poor resolution and precision of these data now limit significantly our ability to estimate flood inundation and risk for the majority of the planet’s surface. The difficulty of deriving an accurate ‘bare-earth’ terrain model due to the interaction of vegetation and urban structures with the satellite-based remote sensors means that global terrain data are often poorest in the areas where people, property (and thus vulnerability) are most concentrated. Furthermore, the current generation of open access global terrain models are over a decade old and many large floodplains, particularly those in developing countries, have undergone significant change in this time. There is therefore a pressing need for a new generation of high resolution and high vertical precision open access global digital elevation models to allow significantly improved global flood hazard models to be developed

    Development of the Global Width Database for Large Rivers

    Get PDF
    River width is a fundamental parameter of river hydrodynamic simulations, but no global-scale river width database based on observed water bodies has yet been developed. Here we present a new algorithm that automatically calculates river width from satellite-based water masks and flow direction maps. The Global Width Database for Large Rivers (GWD-LR) is developed by applying the algorithm to the SRTM Water Body Database and the HydroSHEDS flow direction map. Both bank-to-bank river width and effective river width excluding islands are calculated for river channels between 60S and 60N. The effective river width of GWD-LR is compared with existing river width databases for the Congo and Mississippi Rivers. The effective river width of the GWD-LR is slightly narrower compared to the existing databases, but the relative difference is within ±20% for most river channels. As the river width of the GWD-LR is calculated along the river channels of the HydroSHEDS flow direction map, it is relatively straightforward to apply the GWD-LR to global- and continental-scale river modeling

    Efficient incorporation of channel cross-section geometry uncertainty into regional and global scale flood inundation models

    Get PDF
    This paper investigates the challenge of representing structural differences in river channel cross-section geometry for regional to global scale river hydraulic models and the effect this can have on simulations of wave dynamics. Classically, channel geometry is defined using data, yet at larger scales the necessary information and model structures do not exist to take this approach. We therefore propose a fundamentally different approach where the structural uncertainty in channel geometry is represented using a simple parameterization, which could then be estimated through calibration or data assimilation. This paper first outlines the development of a computationally efficient numerical scheme to represent generalised channel shapes using a single parameter, which is then validated using a simple straight channel test case and shown to predict wetted perimeter to within 2% for the channels tested. An application to the River Severn, UK is also presented, along with an analysis of model sensitivity to channel shape, depth and friction. The channel shape parameter was shown to improve model simulations of river level, particularly for more physically plausible channel roughness and depth parameter ranges. Calibrating channel Manning’s coefficient in a rectangular channel provided similar water level simulation accuracy in terms of Nash-Sutcliffe efficiency to a model where friction and shape or depth were calibrated. However, the calibrated Manning coefficient in the rectangular channel model was ~2/3 greater than the likely physically realistic value for this reach and this erroneously slowed wave propagation times through the reach by several hours. Therefore, for large scale models applied in data sparse areas, calibrating channel depth and/or shape may be preferable to assuming a rectangular geometry and calibrating friction alone

    Development of a global ~90m water body map using multi-temporal Landsat images

    Get PDF
    This paper describes the development of a Global 3 arc-second Water Body Map (G3WBM), using an automated algorithm to process multi-temporal Landsat images from the Global Land Survey (GLS) database. We used 33,890 scenes from 4 GLS epochs in order to delineate a seamless water body map, without cloud and ice/snow gaps. Permanent water bodies were distinguished from temporal water-covered areas by calculating the frequency of water body existence from overlapping, multi-temporal, Landsat scenes. By analyzing the frequency of water body existence at 3 arc-second resolution, the G3WBM separates river channels and floodplains more clearly than previous studies. This suggests that the use of multi-temporal images is as important as analysis at a higher resolution for global water body mapping. The global totals of delineated permanent water body area and temporal water-covered area are 3.25 and 0.49 million km2 respectively, which highlights the importance of river-floodplain separation using multi-temporal images. The accuracy of the water body classification was validated in Hokkaido (Japan) and in the contiguous United States using an existing water body databases. There was almost no commission error, and about 70% of lakes > 1 km2 shows relative water area error < 25%. Though smaller water bodies (< 1 km2) were underestimated mainly due to omission of shoreline pixels, the overall accuracy of the G3WBM should be adequate for larger scale research in hydrology, biogeochemistry, and climate systems and importantly includes a quantification of the temporal nature of global water bodies
    corecore